#long-context language models18/05/2025
Achieving 50.8% on SWE-Bench Using Monolithic Long-Context Language Models Without Tooling
New research shows that powerful long-context language models can reach 50.8% accuracy on the SWE-Bench software engineering benchmark without relying on complex tool scaffolding, simplifying LM agent design.